Ex_treme's blog.

基于物品的协同过滤算法改进(itemcf 改进)

2018/11/21 Share

调试流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def main_flow():
"""
main flow of itmecf
:return:
"""
# 获取用户点击信息{'1': ['1', '3', '6', '47', '50', '70'...]
user_click,user_click_time = reader.get_user_click("/home/pzs741/PycharmProjects/CollaborativeFiltering/data/ratings.csv")
item_info = reader.get_item_info("/home/pzs741/PycharmProjects/CollaborativeFiltering/data/movies.csv")
# 计算物品相似度信息{'1':[('780',0.55),('3114',0.54),('356',0.53)...]...}
sim_info = cal_item_sim(user_click,user_click_time)
debug_itemsim(item_info,sim_info)
# 推荐给用户最近行为相关的物品{'1':{'780':0.55,'3114':0.54,'356':0.53,...}...}
recom_result = cal_recom_result(sim_info, user_click)
debug_recomresult(recom_result,item_info)
# print(recom_result["1"])

调试代码

  • 物品相似度调试代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def debug_itemsim(item_info, sim_info):
"""
show itemsim info
:param item_info:dict, key itemid value:[title, genres]
:param sim_info: dict key itemid, value dict, key [(itemid1,simscore),(itemid2,simscore),]
:return:
"""
fixed_itemid = "1"
if fixed_itemid not in item_info:
print("invalid itemid")
return
[title_fix,genres_fix] = item_info[fixed_itemid]
for zuhe in sim_info[fixed_itemid][:5]:
itemid_sim = zuhe[0]
sim_score = zuhe[1]
if itemid_sim not in item_info:
continue
[title,genres] = item_info[itemid_sim]
print(title_fix+"\t" + genres_fix + "\tsim:" + title + "\t" + genres + "\t" + str(sim_score))
  • 用户推荐结果调试代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def debug_recomresult(recom_result,item_info):
"""

debug recommresult
:param recom_result:key userid value:dict, value_key:itemid ,value_value:recom_score
:param item_info: dict, key itemid value:[title,genre]
:return:
"""
user_id = "1"
if user_id not in recom_result:
print("invalid result")
return
for zuhe in sorted(recom_result[user_id].items(),key=operator.itemgetter(1),reverse=True):
itemid,score = zuhe
if itemid not in item_info:
continue
print(",".join(item_info[itemid]) + "\t" + str(score))

算法改进1

image

1
2
3
4
5
6
7
def update_one_contribute_score(user_total_click_num):
"""
item cf update sim contribution score by user
:param user_total_click_num:
:return:
"""
return 1/math.log10(1+user_total_click_num)

算法改进2

image

1
2
3
4
5
6
7
8
9
10
11
def update_two_contribute_score(click_time_one, click_time_two):
"""
item cf update two sim contribution score bu user
:param click_time_one:
:param click_time_two:
:return:
"""
delata_time = abs(click_time_two-click_time_one)
total_sec = 60*60*24
delata_time = delata_time/total_sec
return 1/(1+delata_time)

调试结果

  • 改进1的调试结果

物品相似度展示
​ Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy sim:Independence Day (a.k.a. ID4) (1996) Action|Adventure|Sci-Fi|Thriller 0.26483331033796265
​ Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy sim:Mission: Impossible (1996) Action|Adventure|Mystery|Thriller 0.25257424492074837
​ Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy sim:Forrest Gump (1994) Comedy|Drama|Romance|War 0.24822700380328572
​ Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy sim:Star Wars: Episode IV - A New Hope (1977) Action|Adventure|Sci-Fi 0.24223482331738602
​ Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy sim:Apollo 13 (1995) Adventure|Drama|IMAX 0.24046567927821974

推荐结果展示
​ Independence Day (a.k.a. ID4) (1996),Action|Adventure|Sci-Fi|Thriller 0.26483331033796265
​ “Rock, The (1996)”,Action|Adventure|Thriller 0.24942328299623753
​ Forrest Gump (1994),Comedy|Drama|Romance|War 0.24822700380328572
​ Twelve Monkeys (a.k.a. 12 Monkeys) (1995),Mystery|Sci-Fi|Thriller 0.2433757033839924
​ Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Sci-Fi 0.24223482331738602
​ Apollo 13 (1995),Adventure|Drama|IMAX 0.24046567927821974
​ Rumble in the Bronx (Hont faan kui) (1995),Action|Adventure|Comedy|Crime 0.22296919914379495
​ “Fugitive, The (1993)”,Thriller 0.2179050017567612
​ Mission: Impossible (1996),Action|Adventure|Mystery|Thriller 0.2157569205071037
​ Father of the Bride Part II (1995),Comedy 0.19755051054639364
​ Broken Arrow (1996),Action|Adventure|Thriller 0.19460907594896107
​ Executive Decision (1996),Action|Adventure|Thriller 0.1944438733023261
​ “Birdcage, The (1996)”,Comedy 0.19092767365291885
​ Twister (1996),Action|Adventure|Romance|Thriller 0.1897369150152543

  • 改进2的调试结果

物品相似度展示
​ Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy sim:Independence Day (a.k.a. ID4) (1996) Action|Adventure|Sci-Fi|Thriller 0.46475374613998177
​ Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy sim:Toy Story 2 (1999) Adventure|Animation|Children|Comedy|Fantasy 0.43256226390108665
​ Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy sim:Forrest Gump (1994) Comedy|Drama|Romance|War 0.42982096383633256
​ Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy sim:Aladdin (1992) Adventure|Animation|Children|Comedy|Musical 0.4272105446504961
​ Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy sim:Star Wars: Episode IV - A New Hope (1977) Action|Adventure|Sci-Fi 0.4217381348100523

推荐结果展示
​ Independence Day (a.k.a. ID4) (1996),Action|Adventure|Sci-Fi|Thriller 0.46475374613998177
​ Toy Story 2 (1999),Adventure|Animation|Children|Comedy|Fantasy 0.43256226390108665
​ Forrest Gump (1994),Comedy|Drama|Romance|War 0.42982096383633256
​ Aladdin (1992),Adventure|Animation|Children|Comedy|Musical 0.4272105446504961
​ Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Sci-Fi 0.4217381348100523
​ “Rock, The (1996)”,Action|Adventure|Thriller 0.4166114199492754
​ Casino (1995),Crime|Drama 0.39177042306799037
​ Twelve Monkeys (a.k.a. 12 Monkeys) (1995),Mystery|Sci-Fi|Thriller 0.3834927290133812
​ Father of the Bride Part II (1995),Comedy 0.3706893668636461
​ Rumble in the Bronx (Hont faan kui) (1995),Action|Adventure|Comedy|Crime 0.3691490738192548
​ Sabrina (1995),Comedy|Romance 0.36722778250755456
​ Twister (1996),Action|Adventure|Romance|Thriller 0.355248820396473
​ “Nutty Professor, The (1996)”,Comedy|Fantasy|Romance|Sci-Fi 0.35453382503684483
​ Primal Fear (1996),Crime|Drama|Mystery|Thriller 0.34343752127181054
​ “Cable Guy, The (1996)”,Comedy|Thriller 0.33956136189636077

CATALOG
  1. 1. 调试流程
  2. 2. 调试代码
  3. 3. 算法改进1
  4. 4. 算法改进2
  5. 5. 调试结果